The Termolator: Terminology Recognition based on Chunking, Statistical and Search-based Scores
نویسندگان
چکیده
The Termolator is a high-performing terminology extraction system, which will soon be available as open source software. The Termolator combines several different approaches to get superior coverage and accuracy. The system identifies potential instances of terminology using a chunking procedure, similar to noun group chunking, but favoring chunks that contain out-of-vocabulary words, nominalizations, technical adjectives, and other specialized word classes. The system ranks such term chunks according to several metrics including: (a) a set of metrics that favors term chunks that are relatively more frequent in a “foreground” corpus about a single topic than they are in a “background” or multi-topic corpus and (b) a relevance score which measures how often terms appear in articles and patents in a Yahoo web search. We analyse the contributions made by each of these metrics and show that all modules contribute to the system’s performance, both in terms of the number and quality of terms identified. Workshop Topic Terminology Extraction
منابع مشابه
Search Space Reduction for Farsi Printed Subwords Recognition by Position of the Points and Signs
In the field of the words recognition, three approaches of words isolation, the overall shape and combination of them are used. Most optical recognition methods recognize the word based on break the word into its letters and then recogniz them. This approach is faced some problems because of the letters isolation dificulties and its recognition accurcy in texts with a low image quality. Therefo...
متن کاملRelationship Between Hospital Cost Based on Current Procedural Terminology and Out-of-Pocket Payment of Oil Company Retirees
Relationship Between Hospital Cost Based on Current Procedural Terminology and Out-of-Pocket Payment of Oil Company Retirees Marzie Afshoon 1, Leila Riahi 2*, Leila Nazarimanesh 3 1 Department of Health Services Management, Science and Research Branch, Islamic Azad University, Tehran, Iran Abstract Introduction: This study aimed to investigate the relationship between hospital cost based ...
متن کاملتخمین سریع ضرایب پیچش در هنجارسازی طول مجرای صوتی با استفاده از امتیاز به دست آمده از مدلسازی تشخیص جنسیت
The performance of automatic speech recognition (ASR) systems is adversely affected by the variations in speakers, audio channels and environmental conditions. Making these systems robust to these variations is still a big challenge. One of the main sources of variations in the speakers is the differences between their Vocal Tract Length (VTL). Vocal Tract Length Normalization (VTLN) is an effe...
متن کاملPattern Recognition in Control Chart Using Neural Network based on a New Statistical Feature
Today for the expedition of the identification and timely correction of process deviations, it is necessary to use advanced techniques to minimize the costs of production of defective products. In this way control charts as one of the important tools for the statistical process control in combination with modern tools such as artificial neural networks have been used. The artificial neural netw...
متن کاملEfficient Support Vector Classifiers for Named Entity Recognition
Named Entity (NE) recognition is a task in which proper nouns and numerical information are extracted from documents and are classified into categories such as person, organization, and date. It is a key technology of Information Extraction and Open-Domain Question Answering. First, we show that an NE recognizer based on Support Vector Machines (SVMs) gives better scores than conventional syste...
متن کامل